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Abstract 



o 
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^r\' A high-performance parallel algorithm is proposed for modeling the propagation of acoustic and elastic 

^ I waves in inhomogencous media. An initial boundary- value problem is replaced by a scries of boundary- value 

problems for a constant elliptic operator and different right-hand sides via the integral Laguerrc transform. 
It is proposed to solve difference equations by the conjugate gradient method for acoustic equations and by 
the GMRES(fc) method for modeling elastic waves. A preconditioning operator was the Laplace operator 
that is inverted using the variable separation method. The novelty of the proposed algorithm is using 
the Dichotomy Algorithm (Terekhov, 2010), which was designed for solving a series of tridiagonal systems 
^^ I of linear equations, in the context of the preconditioning operator inversion. Via considering analytical 

solutions, it is shown that modeling wave processes for long instants of time requires high-resolution meshes. 
The proposed parallel fine-mesh algorithm enabled to solve real application seismic problems in acceptable 
Cu . time and with high accuracy. By solving model problems, it is demonstrated that the considered parallel 

algorithm possesses high performance and efheiency over a wide range of the number of processors (from 2 
to 8192). 
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^D ' 1. Introduction 



Steadily growing number of processors opens up new opportunities for solving complex applied problems, 
for example, elastodynamic problems [lj,l2;|3|. In this case, quite efficient algorithms are numerical-analytical 
algorithms [4|, |5| , where the solution is represented, via the integral time transformation, as the Fourier series 
in terms of some orthonormal system of functions. The expansion coefficients are determined numerically 



C^ ' as a solution of bounda.ry- val ue p roblems for the elliptic type of equation [y, [Zl, |8| 



ary-vali 
Many publications [9|, |lO|, lUl ll^ ll3|, [ij] , are concerned with development and investigation of parallel 
numerical elliptic differential operator inversion algorithms. Nevertheless, this problem remains quite urgent. 
The explanation is that the steady growth of the number of processors integrated within one computer 
system imposes new demands to scalability of parallel algorithms. For instance, methods effective for a 
small number of processors {p < 32), e.g., the cyclic reduction algorithm [15|, |l6[, become ineffective because 
the communication costs prevail over the computational ones. This necessitates further development of 
parallel numerical algorithms that allow using modern computational resources with the greatest efficient 
factor. 
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Publications [17|, |l8|, ll9|, |20|, |2l|, [2^ propose different approaches to solving elliptic equations of second 
order with inseparable variables, where the iterative process is reduced to multiple Laplacian inversion. 



However, realization of efficient procedure [17ll23l l24l. l25|for the Laplacian inversion requires solving tridiag- 



onal systems of linear equations, which, on a multiprocessor system, is a nontrivial problem. This difficulty 
can be overcome by using the Dichotomy Algorithm j26{ , which was developed for inversion of one and same 
tridiagonal matrix for many right-hand sides. The Dichotomy Algorithm was chosen because for this class 
of problems it ensures almost linear dependence of the speedup coefficient in a wide range of the number of 
processors. In terms of accuracy, the number of arithmetic operations, and the number of communications. 



the Dichotomy Algorithm is practically equivalent to the cyclic reduction method [15|, |27|. However, with 
comparable levels of transferred data, the real time of interprocessor communications for the Dichotomy 
Algorithms is much less. The explanation is that the main communication operation of the Dichotomy Al- 
gorithm, that is, all-reduce-to-one (-I-), possesses the associati ve p roperty, which allows reducing the time of 
interprocessor communications due to their optimization 28|,|29[. In the present paper, taking into account 



the high efficiency of the Dichotomy Algorithm, we will consider the possibility of using it within the scope 
of numerical-analytical approach for modeling the propagation of acoustic and elastic waves. 

The peculiarity of the Dichotomy Algorithm is that it was designed for solving problems with the same 
tridiagonal matrix and different right-hand sides. Choosing the integral transformation, we considered the 
fact that prior to solving tridiagonal systems, it is necessary to perform preparatory calculations with a 
volume 0(7V), where N is the dimension of the system of equations. Really, after applying the time Fourier 
transform to the acoustic equation we obtain the boundary- value problem for the Helmholtz equation 

Au„ 4- fc^u„ = /„, n = l,2,... . (1) 

In this case, the dependence of the differential operator on the number of calculated harmonic prevents 
effective usage of the Dichotomy Algorithm because only one right-hand side will correspond to the same 
matrix. The exception is the case of Toeplitz tridiagonal matrices [30| for which the volume of preparatory 
computations is 0{N/p -\- \0g2p) rather than 0{N), where p is the number of processors. Thus, for solution 
of problem ([T]) in the Cartesian coordinate system, the Dichotomy Algorithm can be applied, e.g., in the 
context of the variable separation method that requires inversion of Toeplitz (quasi- Toeplitz) matrices. 

In the present work, we consider media of 2.5D geometry. In this case, in the cylindrical coordinate 
system, for the Laplace operator inversion, it is necessary to solve tridiagonal SLAEs of the general form. 
For this case, we considered the Laguerre transformjjj, after applying it to the acoustic equation, it is 
required to invert one and the same differential operator for all right-hand sides 

n-l 

Au„ - A^u„ = /„ + ^ a„,iMi, 71 = 1,2,... an.i.XeR. (2) 

The fact that the preparatory computations in the context of the Dichotomy Algorithm are performed 
once for all right-hand sides allows one to neglect preparatory expenses. Thus, it becomes possible to use 
the Dichotomy Algorithm for solving problem ^ by methods demanding inversion of general tridiagonal 
matrices. 

In the present paper, using the Laguerre transform and the Dichotomy Algorithm we considered the 
high-performance parallel algorithm for modeling acoustic and elastic waves in 2.5D media. At present, 
applied geophysics problems have to be solved for steadily increasing times and recording systems. On the 
other hand, improvement of practical observing systems necessitates increasing the calculation accuracy. 
In the paper, by considering analytical solutions we have shown that modeling wave processes for longer 
instants of time requires higher resolution meshes. We illustrated the possibility of effective using thousands 
of processors within one calculation. This enabled practical real-time and high-accuracy computing on 
current computers. 



2. Problem Statement and Solution Algorithm 

2.1. Acoustic Equation 

In the cylindrical coordinate system (r, z), in the half-space z > we will consider the problem of 
modeling the propagation of acoustic waves from a point source 

p(x)^(x,t)=V[K(x)Vu(x,t)] + ^^^^i^i^/(t), t>0, ^={r,z). (3) 



Suppose that problem ([3]) is solved with homogeneous initial conditions 



\t=o 



du 
'dt 



(4) 



Assume that at z = the surface is free, and the auxiliary boundaries are entered along the coordinates r 

and z 

du 



dz 



Z = 0d2 



'\r=h 



0. 



(5) 



The boundaries r = li and z ^ I2 are chosen such that waves reflected from them do not arise for the 
calculated instant of time. In addition we demand that 



du 
dr 



= 0. 



Let us represent for the solution of problem ([S])-® as the Fourier-Laguerre seriesj; 

/>oo 

i?^(x) = / u{x,t){ht)-'il^{ht)dt 
Jq 

with the inversion formulas 

00 



(6) 



(7) 



(8) 



where l^{ht) are the orthonormal Laguerre functions [31|, which are represented via classical Laguerre 
polynomials as follows 



I'^niht) 



hm\ 



{m + a)! 



{ht)^e-^L'^{ht). 



Here, m is Laguerre polynomial degree and h is the transformation parameter. The necessary and 
sufficient parameter for satisfying the initial data is a > 2 (a is the order of Laguerre functions). 

As a result, the initial boundary-value problem (121)-® is reduced to the boundary-value problems in the 
spectral domain 



V[.c(x)Vi?„(x)] -p(x)^i?,„(x) = __L ^(^ ''°\ f„, + p{^)h'. 



dR„ 



dr 



dR„ 



r-O 



dz 



Z=0,l2 



2n 



- Rni\r=l^ ~ 0, 



{m + a)\ 



^(m-fc) 



A;=0 



{k + ay. 



k\ 



Rki^), 



(9) 



where /„ = /„" f{t){ht)-^l'^{ht)dt. 

This method can be considered as an analog of the spectral-difference method, based on the Fourier 
transform [8|, but in this case, but the role of "frequency" belongs to the parameter m that determines the 
degree of the polynomials. Contrary to the Fourier method, the harmonic separation parameter is present 
only in the right-hand side. 



2.2. Elastic Medium 

To describe the propagation of elastic waves in a inhomogcneous half-space, we will consider the equations 
of motion in the cylindrical coordinate system [3| 






(A + /i) V (V • W) + ^V^W + VA (V • W) + V/i X (V X W) + 2 (V^i • V) W + pF. (10) 



Here, W is the displacement vector, A > and /x > are Lame coefficients, F is the force vector describing 
the action of space-localized axially symmetric source. 

Let us consider the case of the cylindrical coordinate system (2.5D), where W = (u,., u^)'^ , F = [Fr,Fz) , 
A = X(r,z),n = fJ-(r,z) and p = p{r,z). Assume that at z = , the surface is free, with auxiliary 
boundaries along the coordinates r and z, as in the case of the acoustic equation. Problem pH)) is solved 
with homogeneous initial conditions. 

Represent the solution of problem flU)) as the Fourier-Laguerre series 



.(X,t) = (/lt)t ^ Q„(x)C,(/li), 7.,(x,i) = [ht)^ Y. Urni^)l^{ht). 



(11) 



m— 



m=0 



As a result, defining the expansion coefficients Qm and [/,„ necessitates solving a number of problems of 
the form 



d_ 
dr 



{2p + A) — — + A — — + 

or \ oz r 



d_ 

dz 



dQm , dU„ 



dz 



dr 



2m f dQrr, 
r \ dr 



-pFrfm + ph 



{m + a)l 



ld_ 
r dr 



rp 



[dQm , dU„ 



\ dz 

pFzfm + ph^\ 



dr 



k=0 

d_ 

dz 



^("^-fc)-\/ t, <^fc. 



fc! 



(A + 2p) 



dU„ 



fdUm ^ Qr. 

\ dr r 



P-Ur. 



E(— ^■)^/^^^^- 



(m + a)! '^-^ V A:! 

where the boundary conditions on the free surface take the form[l|, |2|, |3| 

dUm , dQr. 



"^rz 



dz dr 



0, 



z=0 



a..^{X[^^ + — +iX + 2p)^- 



= 0. 



2.3. Approximation of equations 

On a rectangular mesh Co = Cjr 'x ^z = ^Y^l , where 

Cjr = {r, = (i - 0.5)/ir, i = 1, ..., Nr, K ^ li/{Nr - 0.5)} , 
oJz = {zk - (fc - 0.5)/i„ fc = 1, ..., N,, h, = l2/{N, - 0.5)} , 

conform problem ^ with the difference problem 






(12) 

(13) 
(14) 



^ym = /, m = l,2,..., A:H^H, (15) 

where the diflference operator A ^ A* > is given by a scheme of the second order of accuracy [y, |32| 



A,.y 



(Ar + Az) 2/r„ - w{x)y„ 






{aiVf)r ' 1 < i < A^i - 1 



, A;,J/ 



x), x g w, 

1 



/l. 



02 y^ 



I h. 



-022/5 



fc==l 



(a22/.-), , l<k<N2-l 
1 



fc = A^s 



(16) 



(17) 



ai(i, fc) = r^K (ri, Zfc) , 02(1, fc) = r^K {ri,Zk) , w(i, k) == /9(rj, z^)— r^ 



l*,JJ = 



where r^ 



1 (5(J- Jo,.7 -Jo) 



27r 



/„, + p(x)/i^ 



(tti + a)! 



^(m-fc) 



fe=0 



(fc + a)! 



/c! 



2/fc(x), Xq = (io/lr, jo^z)- 



(18) 
0.5ft.r, -Zfc = 2fc +0.5/1^" 2/r, ?/z and y^, J/z are the "backward" and "forward" difference 



W- 



relationships with respect to z and r |17l . |6[ . The boundary condition on the side r 



is approximated 



exactly yNi.k = 0, fc = 1, ..., A'^2- For solving problem p6p , we use the conjugate gradient method 33 1. 
Conform the problem (jlOp on a mesh w with the difference problem 



Cym = f, 171=1,2, 



C -.H 



H. 



(19) 



A number of works [3J, |35[ describe the problem of constructing the discrete analog of problem ((T^ . For 
this reason, in the present paper we performed approximation by the finite- volume method with second order 
of accuracy. For solving problem ([T^ . by virtue of the non self-adjoint difference operator C, we will use the 
GMRES(fc) method, where k is the restart parameter [33|. Note, when using the Laguerre transformation, 
the difference operator is always positive-definite. This guarantees convergence of the GMRES(fc) method 
for any k > lj33|. 

2.4- Preconditioning 

By choosing a preconditioning procedure, one can affect substantially the convergence of iterative al- 
gorithms of solving a system of linear equations and, as a result, the elapsed time. Besides standard 
requirements jlTl |33| upon a preconditioning operator, such as 



energy equivalence of operator B to operator A in the sense of inequalities [^ 

-fi{Bu,u) <{Au,u)<j2{Bu,u); < 71 < 72, A = A* > 0, B = B* > 0, 



(20) 



71 = mm ■ 



{Ax,x) 



72 = max 



{Ax,x) 



x^o {Bx,x)^ '^ x^o {Bx,x) 
operation of inversion of operator B must be less time-consuming than for operator A, 



^For the non self-adjoint case, see, e.g., [171133.1^ 



we require efficiency of a procedure preconditioning of operator inversion on a multiprocessor computer 
system. Since not all preconditioning procedures can be efficiently implemented with the use of hundreds of 
processors (e.g., ILU expansion [33|), the latter requirement drastically limits the class of possible precon- 
ditioners. 

In [26| , based on the Dichotomy Algorithm, the author proposed a high-performance parallel implemen- 
tation of the variable separation method 17|, |2J, |37[ for the Laplace operator inversion. The use of the 
Dichotomy Algorithm for solving tridiagonal systems of linear equations ensures linear dependence of the 
speedup coefficient on the number of processors. Thus, for problem (fT5|) . following works |17l.ll8Lll9l.l2Cll.l21. 
|22|, as the preconditioner operator we will consideijj 



with the coefficients 



B = K + K - d 



ai{i,k) = TiK, a2{i,k) = TiK, d{i,k) = rt—p. 



(21) 



For problem (|19p . the preconditioner is given as 



where i3i = A^ -|- A^ 





Bi 





K = 











B2 


d with the coefficients 







(22) 



ai(i,fc) = ri(A-f 2/x), 02(1, fc) = r^/i, d(i, fc) 
and B2 = K,. + Kz — d with the coefficients 






(A + 2n) 



ai(i,fc) = r^/i, a2(i,fc) = ri(A + 2^), d{i,k) ^ r.i—p 

By virtue of the assumption that the contrast of the medium is moderate and the use of a supercomputer 
implies a great number of mesh nodes, this class of preconditioners enables a good convergence rate. More- 
over, the sought solution will be achieved in the number of iterations, which does not practically depend on 
the number of mesh nodes [l7| . 

Since in the proposed scheme of solution of to problem the main computational and communication costs 
fall on the preconditioner inversion, the algorithm efficiency, as a whole, is determined by performance of 
the parallel procedure of solution of the problems Bay = (f>. 



3. Numerical Experiments 

3.1. Parallel Performance 

For estimating the performance of the proposed algorithm, using Fortran-90 and the MPI paradigm, we 
implemented numerical procedures for solving problems ^ and (jlOp . The Fast Fourier transform, which 
is necessary for the preconditioner inversion, was done using FFTW library j38l |: the tridiagonal systems 
of linear equations were solved using the Dichotomy Algorithm [2 a . |30| . Calculations were performed on 
MBC-lOOk supercomputer (from the Intcrdcpartment Supercomputer Center of the Russian Academy of 
Sciences) and on NKS-30t supercomputer (from the Siberian Supercomputer Center of the Siberian Branch 
of the Russian Academy of Sciences). The computer are based on Intel Xeon four-core processors operating 
at 3 GHz and connected via the Infiniband communication medium. 



^Introduce the notation / = 5 (minj-gQ /(x) -|- max^^gg /(x)) . 



Tabic [T] and Fig. [T]a represent measurement results of performance for the conjugate gradient method; 
Table [2] and Fig. [TJb show those for the GMRES(IO) method. Implementing these algorithms, we achieved 
a nearly linear dependence of the speedup on the number of processors for meshes of different resolution. 
Within one calculation, we managed to involve a considerable number of processors (from 1024 to 8192) 
with an efficiency of 90% to 50%, respectively. The achieved performance and scalability arc provided due to 
using the Dichotomy Algorithm in the context of the parallel preconditioner inversion. Thus, the algorithm 
will substantially increase the efficiency of usage of supercomputer computational resources in solving elliptic 
equations. Thus, in solving applied geophysics problems. 
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Table 1: Calculation time (T) and speedup (S) versus the number of processors for one iteration of the CG method. 
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Table 2: Calculation time (T) and speedup (S) versus the number of processors for one cycle of the GMRES(IO) method. 

Dupros et al. [39| considered a parallel algorithm for solving the dynamic problem of the elasticity theory. 
They demonstrated the possibility of using 1024 processors. Up to 256 processors, the authors have obtained 
a high speedup; however, the algorithm efficiency was lower in the range from 256 to 1024 processors. Our 
algorithm ensures a high efficiency in the range from 64 to 8192 processors, the software implementation 
been much simpler. 

As a result of numerical experiments it has been found that the execution time of the first iteration for 
the CG and GMRES(fc) methods is several times greater than that of subsequent ones. This is explained by 
application of dynamic optimization of interprocessor communications on the level of MPI-Reduce(" +" ) after 
repeated execution of the main communication operation " +" in the context of the Dichotomy Algorithm. 
In this case, due to the associative addition, the order of processor exchanges is set such that to minimize 
as much as possible the communication time. Thus, the possibility of applying the algorithms of dynamic 
optimization of the communication interactions ensures the high performance of the Dichotomy Algorithm. 
We should note that for the cyclic reduction method, a fixed order of elimination of unknowns prevents 
optimization of the communication interactions to a full extent. For this reason, in practice, the Dichotomy 
Algorithm possesses a much higher performance than the cyclic reduction method. 

It is known that the efficiency of variational methods for solving SLAEs on a supercomputer decreases 
because of intensive communication interactions while computing|| • Jl on distributed data. This problem 
can be solved by means of modifications of the known algorithms [36[. Let us compare estimates of the 
communication time for computing j| • || for the CG and GMRES(fc) methods and the communication time 
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(b) Dependence of calculation time of one cy- 
cle for the GMRES(IO) method for meshes of 
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of the Dichotomy Algorithm: 



-i|| ■ II , all — reduce 



n^ Dichotomy 
P 



21og2(p)a + H^(7 + 2/?), 



: [log2(p) + 1] \og,ip) + I {\og,ip) - ^) (7 + 2/3) . 



From, this it follows that for computer systems with a low latency (a) and for Z 3> 1, the communication 
time for calculating || • || is insignificant, compared to the communication costs of the Dichotomy Algorithm. 
Thus, the chosen precondition procedure does not need modifications of the CG and GMRES(fc) methods. 



3.2. Acoustic Waves 

The use of mesh methods in spatial derivative approximation cause a numerical effect called a "phase 
error" [8|. In modeling wave propagation processes for long instants of time, this effect determines consider- 
ably the accuracy of the solution. For this reason, from the view of practice, an urgent problem is choosing 
the number of mesh nodes per characteristic wavelength. The high performance of the proposed algorithm 
allows one to estimate the accuracy of solution for meshes with a high resolution {ha = 1/lOOA -^ 1/150A). 

Tables 1 and 2 show that calculation of acoustic waves requires much less count time than that of elastic 
waves. Hence, we first consider the problem of modeling acoustic wave propagation in a homogeneous 
medium p, k = const. This made it possible to investigate the accuracy of solution for meshes with much 
more nodes and with less computational costs. 

For problem (|3]), a point source was situated at the origin of coordinates; the time dependence was given 



as 



fit) = exp 



(2^/o(t - io))' 



r 



sm{2Trfo{t-to)), 



(23) 



where /o = 30Hz, ^o = 0.2s, 7 = 4. 

Approximation of Eq. © was done on the uniform mesh ui with A^i = A^2 = 2'^ nodes, k ~ {12, 13, 14, 15}. 
The number of addends in series (|S]) was n = 3000; the expansion parameters were a = 9, h = 400. The 
distances were measured in wavelength A . 



The time dependencies of the wavcficld amphtudcs for four receivers situated on the free surface at 
different lengths from the source arc represented in Fig. ^ It is seen that the accuracy of the obtained 
solution for different instants of time depends substantially on the number of mesh nodes per characteristic 
wavelength. For instance, for first instants of time, for achieving a reasonable calculation accuracy, the mesh 
with a space step hr = hz = 1/40A is sufficient (Fig. [5Ja); for longer time intervals it is required to decrease 
the mesh step in order to keep a reasonable level of the calculation accuracy. 
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Figure 2: The time dependence of solution u{xi,t) for the acoustic equation, where x; = (Ri,0), i = 1,2,3,4. 



Figure [3] represents dependence of the accuracy of the obtained solution on the receiver position for 
meshes with different resolution: 



e(xO = 



\ /oM'«exact(Xi,t)]^dt 



, Xi = {ihr,0), i = l,..,Ni 



where Uexactii^,0,t) — is the exact solution and Uh is the numerical solution obtained 

on the mesh with the space step hr = h^ = h. 

It can be easily found that when the time interval of modeling is increased m times, the space step must 
be decreased ~ ^fm times; this agrees with theoretical estimates for approximation methods of the second 
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Figure 3: Dependence of the solution accuracy on the position of receiver 1 13.211 for the meshes of different resolution. 



order of accuracy |8|. Thus, for acoustic wave simulation for long time intervals, it is necessary to use meshes 
with a sufficient number of nodes in order that numerical effects caused by the model resolution do not 
predominate. 

We will note that for solving the problem, higher-order schemes [4C 



34| are suitable. In the context of 



the parallel algorithm, increasing approximation order does not cause loss of efficiency because the precon- 
ditioning for higher-order schemes can be done with the second order. Naturally, in this case the number 
of iterations for the CG and GMRES(fc) methods for achieving the desired accuracy will be a bit more, but 
the behavior of the dependence of the speedup on the number of processors will not change. 

3.3. Solid layer over Solid Half Space 

Although early results on elastic waveficld modeling have been obtained long ago [41|, [421, however, 
they were rather qualitative because of a large step of the space mesh h = 1/5A -;- 1/2A. Considerably 
increased computer performance and also development of multiprocessor computer systems have made it 
possible to increase the calculation accuracy 43|, |4J, |45| ■ However, in spite of available theoretical estimates 
of the dependence of solution accuracy on mesh step [8|, the problem of practical choosing a space step 
of meshes is still urgent. By solving the acoustic equation, it was illustrated that calculations for long 
instants of time require meshes with many nodes. Taking into account that the proposed parallel algorithm 
possesses high performance, we will analyze issues of accuracy for problem (|10p for meshes h = hr = hz = 
{1/lOAs, l/20As, l/45As, l/90As}, where As — minT4//o- Here, Vg is the 5-wavc propagation velocity and 
/o is the source frequency. 

Let us consider a problem on elastic wave propagation in a thin layer whose scam thickness is comparable 
with the wavelength (Fig. Ula). The wavefield source is a source of the type of "center of pressure" 3- 



2tt dr 



Sir) 



S{z -d), F, 



1 5{r) d 
27r r dz 



S{z~d). 



(24) 



The time dependence of the pulse f{t) was determined in (P^ . where /o ~ 30Hz, to = 0.2s and 7 = 4. The 
source is placed at the depth d = 10m. In the calculations, the instants of time were t e (0, 5]s. The number 
of terms of series pT|) was n = 2000 with the parameters a = 8 and h = 600. 

In problems of simulation of wavefields, in particular, seismic ones, the governing factor is choosing model 
problems to estimate accuracy of numerical algorithms. A common method applied for layered media is the 



method proposed in [46|,|47[ and extended in |48l.l49|. etc. The drawback of the method is that it introduces 
interference (artifacts). The use of minor matrices [50| made it possible to extend applicability of the matrix 
method, but did not eliminated artifacts. In the present paper, we evaluate the accuracy of the proposed 
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parallel algorithm with the use of the approach described in [5l|, |52| . The essence of the method is that 
the sought-for boundary-value problem of second order in the spectral domain is reduced to two Cauchy 
problems of first order, to which there exists a stable analytical solution. As a consequence, this made it 
possible to remove all constraints on the powers of layers, frequencies, and recording systems. Comparison 
of the modeling results and results obtained by means of the analytical method made it possible to estimate 
the dependence of the numerical solution accuracy on the space mesh step. 

The medium model and a snapshot of the wavefield for the component Uz{r, z) at i = 3s. are represented 
in Fig.Hla. FiguresHlb,c and Figs.[5]a,b show the component Uz as a function of time for a receiver situated 
on the free surface at r = 1500. 

Exact 
\/\0K 




1,4 1,5 1,6 

Time sec. 



Exact 

1/45^. 




1,4 1,5 1,6 

Time sec. 



Figure 4: (a) A snapshot for the displacement vector component Uz{z,r) at i = 3s in the presence of a thin layer. The time 
dependence of amplitude t G [l.l,1.9]s for detector receiver n^ (1500m,, 0) for calculations on the meshes: (b) Nr X Nz = 
{4096 X 4096, 8192 x 8192}; (c) Nr X N^ = {16384 x 16384, 32768 x 32768}. 

Figure [Hb and Fig. [5]a evidence that the meshes with steps 1/lOAs and l/20As ensure no accuracy. 
Thus, the mesh with l/45As or l/90\s can ensure an acceptable level of accuracy for initial instants of times 
(Fig. IHb and Fig. [5ja). For final instants of time (Fig. |4lc and Fig. \E[h) the calculations have to be done on 
the mesh with l/90As . The results of evaluating the accuracy of solution for the problem of the elasticity 
theory are in good agreement with the results obtained for the acoustic equation. Therefore, we can state 
that the main factor determining the accuracy of solution in wave process simulation is the number of mesh 
nodes per wavelength. Also, modeling of real space-time scales requires modeling meshes with a higher 
resolution. 
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Figure 5: The time dependence of amplitude t £ [4, 5]s for detector receiver u-(1500m, 0) for calculations on the meshes: (a) 
NrXN2 = {4096 X 4096, 8192 x 8192}; (b) Nr X N^ = {16384 x 16384, 32768 x 32768}. 



3.4- Marmousi 

For illustrating the ability of the proposed algorithm to perform (for an acceptable time) elastic wavefield 
simulation for real application problems, we will consider problem (jlOp for the Marmousi medium (Fig. [6la) 

The calculations were done for t e (0, 6]s on the meshes with A^^rXiV^ = {8192x2048, 16384x4096, 32768x 
8192} nodes, which corresponded to a space step hr = h^ = {1.5m, 0.75m, 0.375m}. The wavefield was 
modeled from a source of the type of center of pressure (|23D . (P^ with the parameters /o = lOHz, io = Is 
and 7 = 4. The number of terms of series ([TT|) was n = 1200 with the parameters a = 8 and h — 300. 
Figure [6lb shows a snapshot of a wavefield for the displacement vector component Ur{r,z) &% t — 6s; 
Figs. [71a,b represent dependencies of the component Ur(r,z) along straight lines " Slice- R" and "Slice-Z" 
. Comparing results obtained for different meshes, we conclude that an acceptable accuracy for the final 
instant of time is achieved in calculations with the mesh space step hr = hz ~ 0.75m, which corresponds to 
{Nr X Nz} = {16384 X 4096} nodes. For this mesh, according to Table[31 computing will take 4.7 hours with 
1024 processors. The efficiency is about 90%. We should note that sometimes it is reasonable to perform 
computing using more processors, but with a lower efficiency, because in this case the amount of storage 
is increased. This makes it possible to choose greater values of k for the GMRES(fc) method and, thereby 
ensure a higher convergence rate[33|. 
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256 


512 


1024 


2048 


4096 


Mesh 


Time Hours 


8192 X 2048 


3.1 


1.6 


1.4 


2.4 


4.2 


16384x4096 


17 


8.6 


4.7 


3.8 


4 


32768 X 8192 


68 


34 


19.6 


12 


11 



Table 3: Calculation time versus the number of processors for meshes of different resolution for the Marmousi medium. 
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Figure 6: (a) Marmousi model (z > 500m): P,S - wave velocities; (b) a snapshot for the displacement vector component 

Ur{r, z) at t = 6 s. 



The numerical experiments have shown that the proposed algorithm enables not only to perform mod- 
eling, but also to solve real application geophysics problems. For doing so, both a supercomputer with a 
moderate number of processors (64 -^ 256) and multiprocessor systems integrating thousands of computing 
elements are used effectively. 
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Figure 7: Dependence of the field amplitude for the component «,. at i = 6 along straight lines (a) Slice-Z and (b) Slice-R. 



4. Conclusions 

We have proposed the parallel algorithm for solving an acoustic equation and dynamic problem of the 
elasticity theory in a cylindrical coordinate system (2.5 D). The Laguerre time transform was used to 
perform changing from the initially boundary- value problem to the problem of inversion of the same elliptic 
second-order operator for different right-hand sides. The difference equations resulting from elliptic operator 
approximation were solved by the CG method or the GMRES method. Choosing the Laplace operator as the 
preconditioning one allowed for a high convergence rate of the iterative process for media with a moderate 
contrast. 

The nearly linear dependence of the speedup and the high scalability of the parallel algorithm on the 
number of processors were ensured due to the Dichotomy Algorithm in the context of the variable separation 
method for inverting the preconditional operator. The proposed algorithm has validated its efficiency in 
calculations with 64 to 8192 processors. Thus, the high performance of the Dichotomy Algorithm and its 
simple implementation enable efficient parallelization of economic numerical procedures that require multiple 
solution of tridiagonal systems of equations. 

The main conclusion is that the wave process modeling for longer time intervals requires increasing 
number of space mesh nodes. This causes the necessity of applying high-performance computer systems for 
solving application problems. It has been shown that the proposed parallel algorithm, which based on the 
known economic numerical methods and the Dichotomy Algorithm, makes it possible to efficiently involve 
thousands of processors within one calculation. This enables to perform practical calculations for real models 
of media, times, and distances with the desired accuracy. 
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